feat: Perform tolerance-based comparison for lists and arrays by MariusMerkleQC · Pull Request #19 · Quantco/diffly

MariusMerkleQC · 2026-03-19T19:21:47Z

Motivation

Partially addresses #8. Sequences (lists and arrays) are compared element-wise, iterating over each position in the sequence. Each element is then compared using the standard type-aware logic, so absolute/relative float tolerances and absolute temporal tolerances all apply naturally.

Maximum sequence length

An array's length (or shape for multi-dimensional arrays) is known statically from its data type. When at least one of the two compared columns is an array, its length determines the number of elements to compare. When both columns are lists, the maximum list length must be computed at runtime — this is handled by the cached property _max_list_lengths_by_column, a dictionary mapping column names to their maximum list length, populated only for columns that are pl.List in both data frames. The resolved max_list_length: int | None is then passed to condition_equal_columns().

Sequences of different lengths

Arrays have a fixed length, so comparing two arrays of different shapes can immediately return False. In all other cases, lengths may vary row-by-row, which is captured in the has_same_length expression. To avoid out-of-bound errors when indexing into shorter lists, null_on_oob=True is used instead of raising. The final result combines has_same_length with elements_match (the element-wise comparison), so rows with mismatched lengths are marked as unequal.

Multi-dimensional sequences

Nested sequences (e.g., lists of lists or multi-dimensional arrays) are handled recursively: outer elements are extracted positionally, then compared via the same _compare_columns logic until primitive types are reached. When both sides are lists at an inner nesting level, no max_list_length is available, so the comparison falls back to direct equality without element-wise unrolling (i.e., tolerances do not apply at inner list levels).

Changes

add cached property _max_list_lengths_by_column: dict[str, int]
introduce function _compare_sequence_columns() to compare lists and arrays with each other
add extensive test coverage
- Modify test_condition_equal_columns_list_array_{equal_exact -> with_tolerance} to reflect the updated logic
- Build on the former test with nested sequence types in test_condition_equal_columns_nested_list_array_with_tolerance
- test comparison of two list columns in test_condition_equal_columns_two_lists, including empty lists, lists with None and None
- cover mismatches of lengths in test_condition_equal_columns_array_vs_list_length_mismatch
- cover mismatching array shapes in test_condition_equal_columns_two_arrays_different_shapes
- handle the edge case of empty arrays and lists in test_condition_equal_columns_empty_arrays and test_condition_equal_columns_empty_lists, respectively

codecov · 2026-03-19T19:22:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2ae4c11) to head (955517c).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #19   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          707       743   +36     
=========================================
+ Hits           707       743   +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

diffly/_conditions.py

MariusMerkleQC · 2026-03-20T09:30:15Z

tests/test_conditions.py

+    if isinstance(lhs_type, pl.List) and isinstance(rhs_type, pl.List):
+        assert actual.to_list() == [True, False, False]


To fix this, I already had a solution that computes the maximum list length among all nesting levels in a "data type tree". For example, if you had a list of lists where the inner lists are longer than the outer lists, max_list_length would be the value of the inner list length. As this increases complexity even more, I'd like to first get this to main and implement it in a follow-up PR.

MariusMerkleQC added 2 commits March 19, 2026 20:20

feat: Perform tolerance-based comparison for lists and arrays

5ec11f7

remove multi-array

1c0050f

MariusMerkleQC self-assigned this Mar 19, 2026

MariusMerkleQC linked an issue Mar 19, 2026 that may be closed by this pull request

Properly perform floating point comparisons for structs and lists #8

Open

github-actions bot added the enhancement New feature or request label Mar 19, 2026

MariusMerkleQC added 4 commits March 19, 2026 20:35

clean up

68dc630

improve

1d4df7b

clean compare_sequence_columns

d528ecd

clean up

d7da250

MariusMerkleQC requested a review from Copilot March 20, 2026 08:30

Copilot started reviewing on behalf of MariusMerkleQC March 20, 2026 08:30 View session

This comment was marked as outdated.

Sign in to view

feedback copilot

955517c

MariusMerkleQC requested a review from Copilot March 20, 2026 09:18

Copilot started reviewing on behalf of MariusMerkleQC March 20, 2026 09:18 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

diffly/_conditions.py Show resolved Hide resolved

Quantco deleted a comment from Copilot AI Mar 20, 2026

MariusMerkleQC commented Mar 20, 2026

View reviewed changes

MariusMerkleQC marked this pull request as ready for review March 20, 2026 09:30

MariusMerkleQC requested review from EgeKaraismailogluQC and borchero as code owners March 20, 2026 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Perform tolerance-based comparison for lists and arrays#19

feat: Perform tolerance-based comparison for lists and arrays#19
MariusMerkleQC wants to merge 7 commits intomainfrom
list_arr

MariusMerkleQC commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

MariusMerkleQC Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if isinstance(lhs_type, pl.List) and isinstance(rhs_type, pl.List):
		assert actual.to_list() == [True, False, False]

Conversation

MariusMerkleQC commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Maximum sequence length

Sequences of different lengths

Multi-dimensional sequences

Changes

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

MariusMerkleQC Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MariusMerkleQC commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading